65 research outputs found

    Protein complex prediction based on k-connected subgraphs in protein interaction network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein complexes play an important role in cellular mechanisms. Recently, several methods have been presented to predict protein complexes in a protein interaction network. In these methods, a protein complex is predicted as a dense subgraph of protein interactions. However, interactions data are incomplete and a protein complex does not have to be a complete or dense subgraph.</p> <p>Results</p> <p>We propose a more appropriate protein complex prediction method, CFA, that is based on connectivity number on subgraphs. We evaluate CFA using several protein interaction networks on reference protein complexes in two benchmark data sets (MIPS and Aloy), containing 1142 and 61 known complexes respectively. We compare CFA to some existing protein complex prediction methods (CMC, MCL, PCP and RNSC) in terms of recall and precision. We show that CFA predicts more complexes correctly at a competitive level of precision.</p> <p>Conclusions</p> <p>Many real complexes with different connectivity level in protein interaction network can be predicted based on connectivity number. Our CFA program and results are freely available from <url>http://www.bioinf.cs.ipm.ir/softwares/cfa/CFA.rar</url>.</p

    SSP: An interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads

    Get PDF
    AbstractRecent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp

    Impact of RNA structure on the prediction of donor and acceptor splice sites

    Get PDF
    BACKGROUND: gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms. RESULTS: we sought the role of mRNA secondary structures and their information contents for five vertebrate and plant splice site datasets. We selected 900-nucleotide sequences centered at each (real or decoy) donor and acceptor sites, and predicted their corresponding RNA structures by Vienna software. Then, based on whether the nucleotide is in a stem or not, the conventional four-letter nucleotide alphabet was translated into an eight-letter alphabet. Zero-, first- and second-order Markov models were selected as the signal detection methods. It is shown that applying the eight-letter alphabet compared to the four-letter alphabet considerably increases the accuracy of both donor and acceptor site predictions in case of higher order Markov models. CONCLUSION: Our results imply that RNA structure contains important data and future gene prediction programs can take advantage of such information

    A pairwise residue contact area-based mean force potential for discrimination of native protein structure

    Get PDF
    Abstract Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield</p

    A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete.</p> <p>Results</p> <p>In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form α-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities.</p> <p>Conclusion</p> <p>We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.</p

    ADRML: anticancer drug response prediction using manifold learning

    No full text
    Abstract One of the prominent challenges in precision medicine is to select the most appropriate treatment strategy for each patient based on the personalized information. The availability of massive data about drugs and cell lines facilitates the possibility of proposing efficient computational models for predicting anticancer drug response. In this study, we propose ADRML, a model for Anticancer Drug Response Prediction using Manifold Learning to systematically integrate the cell line information with the drug information to make accurate predictions about drug therapeutic. The proposed model maps the drug response matrix into the lower-rank spaces that lead to obtaining new perspectives about cell lines and drugs. The drug response for a new cell line-drug pair is computed using the low-rank features. The evaluation of ADRML performance on various types of cell lines and drug information, in addition to the comparisons with previously proposed methods, shows that ADRML provides accurate and robust predictions. Further investigations about the association between drug response and pathway activity scores reveal that the predicted drug responses can shed light on the underlying drug mechanism. Also, the case studies suggest that the predictions of ADRML about novel cell line-drug pairs are validated by reliable pieces of evidence from the literature. Consequently, the evaluations verify that ADRML can be used in accurately predicting and imputing the anticancer drug response
    • 

    corecore